Using homology relations within a database markedly boosts protein sequence similarity search.
نویسندگان
چکیده
Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.
منابع مشابه
Similarity Search Using Pre-Search in UniRef100 Database
Sequence similarity in biological databases is used to characterize a newly discovered protein and confirming the existence of its homologs. This is often computationally very expensive. We have implemented a new algorithm that performs sequence similarity search using a pre-search phase. The proposed algorithm works in three phases. As a prepreparation for Pre-Search, we locate a sequence, sim...
متن کاملComputational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)
Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...
متن کاملHorA web server to infer homology between proteins using sequence and structural similarity
The biological properties of proteins are often gleaned through comparative analysis of evolutionary relatives. Although protein structure similarity search methods detect more distant homologs than purely sequence-based methods, structural resemblance can result from either homology (common ancestry) or analogy (similarity without common ancestry). While many existing web servers detect struct...
متن کاملCrowdsourcing Protein Family Database Curation
We propose a novel method for crowdsourcing a protein family database. We discuss how we intend to identify novel groupings of proteins from user sequence similarity search, and how text mining will be applied to assist in annotation of these novel groupings, and more broadly as an enrichment of protein sequence similarity search results. We intend to use entity linking to identify literature w...
متن کاملPFMFind: A System for Discovery of Peptide Homology and Function
Protein Fragment Motif Finder (PFMFind) is a system that enables e cient discovery of relationships between short fragments of protein sequences using similarity search. It supports queries based on amino acid similarity matrices and position specific score matrices (PSSMs) obtained through an iterative procedure. PSSM construction is customisable through plugins written in Python. PFMFind cons...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings of the National Academy of Sciences of the United States of America
دوره 112 22 شماره
صفحات -
تاریخ انتشار 2015